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Abstract—Psychological personality has been shown to affect a 
variety of aspects: preferences for interaction styles in the digital 
world and for music genres, for example. Consequently, the 
design of personalized user interfaces and music recommender 
systems might benefit from understanding the relationship be- 
tween personality and use of social media. Since there has 
not been a study between personality and use of Twitter at 
large, we set out to analyze the relationship between personality 
and different types of Twitter users, including popular users 
and influentials. For 335 users, we gather personality data, 
analyze it, and find that both popular users and influentials are 
extroverts and emotionally stable (low in the trait of Neuroticism). 
Interestingly, we also find that popular users are ‘imaginative’ 
(high in Openness), while influentials tend to be ‘organized’ 
(high in Conscientiousness). We then show a way of accurately 
predicting a user’s personality simply based on three counts 
publicly available on profiles: following, followers, and listed 
counts. Knowing these three quantities about an active user, 
one can predict the user’s five personality traits with a root- 
mean-squared error below 0.88 on a [1,5] scale. Based on 
these promising results, we argue that being able to predict 
user personality goes well beyond our initial goal of informing 
the design of new personalized applications as it, for example, 
expands current studies on privacy in social media. 

Index Terms—Web 2.0, Personality, Social Networks, Twitter 


I. INTRODUCTION 


Personality has been found to significantly correlate with a 
number of real-world behaviors. It correlates with music taste: 
popular music tends to be significantly liked by extroverts, 
while people with a tendency to be less open to experience 
tend to prefer religious music and dislike rock music [32]. 
Personality also impacts the formation of social relations [36]: 
friends tend to be, to a very similar extent, open to experience 
and extrovert [35]. 

Personality also influences how people interact online. Pre- 
vious work has shown this to be the case for Facebook users, 
but there has not been any analysis of Twitter users at scale 
(Section II). Since Twitter differs from Facebook, it would be 
beneficial to extend previous work to Twitter. To this end, we 
gather personality data for 335 Twitter users (Section III), and 
we then make two main contributions: 

e We study the relationship between the big five personality 
traits and five types of Twitter users: listeners (those who 
follow many users), popular (those who are followed by 
many), highly-read (those who are often listed in others’ 
reading lists), and two types of influentials (Section IV). 
We find that popular users and influentials are both 














; : High scorers Low scorers 
Personality trait 
Openness Imaginative Conventional 
Conscientiousness Organized Spontaneous 
Extraversion Outgoing Solitary 
Agreeableness Trusting Competitive 
Neuroticism Prone to stress and Emotionally 

worry stable 
TABLE I 


THE BIG FIVE PERSONALITY TRAITS. 


extrovert and emotionally stable (low in the personality 
trait of Neuroticism). Also, popular users are high in 
Openness (they are ‘imaginative’ ), while influentials tend 
to be high in Conscientiousness (they are ‘organized’). 

e We predict a user’s personality traits out of three num- 
bers that are publicly available on any Twitter profile 
(Section V): the number of profiles the user follows 
(following), number of followers, and number of times 
the user has been listed in others’ reading lists. We 
find that Openness is the easiest trait to predict, while 
Extraversion is the most difficult. Yet, the error (RMSE) 
for Extraversion is as low as 0.88 on a [1,5] scale. 


These results not only provide insights on the personality of 
different Twitter users but also inform current research on pri- 
vacy of social media users (Section VI) and suggest practical 
applications in a variety of areas, including marketing, user 
interface design, and recommender systems (Section VID). 


II. BACKGROUND AND RELATED WORK 


The relationship between real-world social networks and 
personality has been usually studied using a personality test 
called “The Big Five”. 


A. The Big Five Personality Test 


The five-factor model of personality, or the big five, is the 
most comprehensive, reliable and useful set of personality 
concepts [7], [11]. An individual is associated with five scores 
that correspond to the five main personality traits and that form 
the acronym of OCEAN (Table I collates a brief explanation 
of each trait). Imaginative, spontaneous, and adventurous 
individuals are high in Openess. Ambitious, resourceful and 
persistent individuals are high in Conscientiousness. Indi- 
viduals who are sociable and tend to seek excitement are 


high in Extraversion [2], [5], [13], [34], [39]. Those high in 
Agreeableness are trusting, altruistic, tender-minded, and are 
motivated to maintain positive relationships with others [15]. 
Finally, emotionally liable and impulsive individuals are high 
in Neuroticism [18], [19]. 


B. Personality and Social Media 


There has been few studies on how personality impacts in- 
teractions on social media. These studies have mainly analyzed 
the impact of personality on: 


e Using social media sites. Extroverts tend to find social 
media easy to use and useful [33]. 

e Selecting social contacts. Users select contacts with 
similar Agreeableness, Extraversion, and Openness, and 
they generally tend to prefer people high in Agreeable- 
ness [35]. 

e Keeping large number of contacts. As one expects, the 
personality traits that correlates the most with number of 
social contacts is Extraversion [28], [36]. 


These preliminary studies have been recently expanded by 
Golbeck et al. [10]. The researchers analyzed the personality 
of 167 Facebook users and successfully predicted these users’ 
five personality traits out of users’ personal information and 
posts. More recently, Quercia ef al. studied the relationship 
between sociometric popularity (number of Facebook contacts) 
and personality traits on a far larger number of subjects [29]. 
They concluded that popular Facebook users tend to have 
the same personality as people popular in the real world, 
suggesting that the nature of online interactions does not 
significantly differ from that of real world interactions. Also, 
they tested a widely held conjecture: that people who have 
many social contacts on Facebook are the ones who are able 
to adapt themselves to new forms of communication, present 
themselves in likable ways, and have propensity to maintain 
superficial relationships. They found no statistical evidence for 
such a conjecture. 

There has not been any study on the personality of Twitter 
users, and Twitter differs from Facebook: people can use 
the two platforms in very different ways, if they choose to. 
Facebook is a social networking site that generally connects 
people who already know each other (e.g., friends, family 
and co-workers) - they very default is that two individuals 
need to be mutual friends on Facebook to fully share what 
they have been up to. Instead, Twitter is a social media site 
on which users can see just about anything about anybody, 
unless they protect their updates, which only a very tiny 
minority of active users do [24]. This important difference 
makes it possible to expand current discussions on privacy in 
social media by showing that one can accurately predict user 
personality not only from information on Facebook (whose 
accessed is generally restricted) but also from truly publicly 
available Twitter information. 


III. DATA COLLECTION 


To associate personality scores to Twitter users, we gather 
data from a Facebook application called myPersonality. 
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Fig. 2. Distributions of the logarithm of the number of following, followers, 
and times a user has been listed. 


A. myPersonality 


More than five million Facebook users have been able to 
take a variety of genuine personality and ability tests by 
installing myPersonality (Figure 1). Users are not paid and 
are solely motivated by the prospect of receiving reliable 
personality test results. The application ensures high test result 
validity by removing the protocols that may be a product 
of inattentive, language incompetent, or randomly responding 
individuals. The resulting quality of the responses is high: the 
scales’ reliabilities are on average higher than reported in test 
manuals! and the discriminant validity (average r = .16) is 
better than one obtained using traditional samples (average 
r = .20 [16]). 

myPersonality users can give their consent to share their 
personality scores and profile information, and around 40% 
of them choose to do so. Only few hundreds of those users 
have posted links to their Twitter accounts though. Twitter is a 
micro-blogging site on which users send and read short mes- 
sages (up to 140 characters) called tweets. In general, tweets 
are publicly visible and are followed by subscribers called 
followers. Users of particular interest are kept in one’s reading 
list. A profile’s Jisted count is the number of users whose 
reading lists contain the profile’s tweets. The distributions of 
following, followers, and listed are reported in Figure 2 and 
suggest that the users in our sample are more active than the 
users in the general Twitter population, that is, our users have 
higher numbers of following and followers. 


B. Data Description 


We consider all users who specified their twitter accounts 
on their Facebook profiles, verified the matching between 
Facebook and Twitter accounts, and end up having 335 Twitter 
users. We analyze the big five personality test results for those 
users. This group is composed of 171 women (52%) and 164 
men (48%) and mirrors the gender distribution in Twitter: 
according to a study by the social media analytics company 
Sysomos in June 2009, women make up a slightly larger 
Twitter demographic than men (53% over 47% percent) [6]. 
Knowing the age of 165 users, we plot their age distribution 
in Figure 3(a) and estimate a geometric average of 22.7. As 
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Fig. 1. 


we shall see in the next section, age is an important factor as 
it affects a user’s activity. Figure 3(b) shows the logarithms 
of following, followers, and listed for users of increasing age 
(11 bins). 


IV. PERSONALITY OF TWITTER USERS 


We identify four characteristics that define four distinct 
types of users. By considering the publicly available counts 
of (what Twitter calls) ‘following’, ‘followers’, and ‘listed’ 
counts, we identify three types of users first: listeners (those 
who follow many users), popular (those who are followed by 
many), and highly-read (those who are often listed in others’ 
reading lists). We then identify influential users using two 
influence scores named ‘Klout’ and ‘TIME’ (detailed below). 

We study the relationship between personality traits and the 
five user characteristics, that is, the logarithm of the number 
of followed users (following), followers, listings, and the two 
influence scores. We are interested in the logarithm because the 
corresponding distributions are not normal and their logarithm 
transformations (Figure 2) account for the violation of normal- 
ity. We study the Pearson product-moment correlation between 
the logarithm of the five user characteristics and each of the 
(big) five personality traits, plus two additional attributes, 
namely age and sex. Pearson’s correlation r € [—1,1] is 
a measure of the linear relationship between two random 
variables. 


We report the correlations in Table II’, and then plot 
the relations between three types of users (popular, highly- 
read, and (K/out) influential) and the five personality traits at 
population-level in Figure 4. 


Listeners and Popular. Strongest and significant correlations 
are found with Extraversion (0.13 for listeners and 0.15 for 
popular users) and with Neuroticism (—0.17 for listeners and 
—0.19 for popular users). That is reasonable since Extraver- 
sion and Neuroticism are predictors for number of friends in 
the real world and number of social contacts on Facebook [10], 
[27], [39] - in both real life and Facebook, sociometrically 
popular individuals are extroverts (high in Extraversion) and 


Consider that very low correlation coefficients can still be highly statisti- 
cally significant: the correlation Klout-N of -0.03 is a case in point. 


Part of the myPersonality user interface. 


emotionally stable (low in Neuroticism). Also, individuals high 
in Extraversion tend to maintain persistent communication 
with their friends [2], [5], [13], [34], while those high in 
Neuroticism withdraw from other during times of stress [22], 
[23] and generally report less satisfaction with the support 
received by their social networks [18], [19]. Listeners and 
popular users also tend to be older - the correlation with age 
is 0.28 for listeners and 0.37 for popular users. This indicates 
that older individuals tend to follow more users on Twitter than 
younger ones do. In Facebook, the opposite holds - the older a 
user, the fewer his/her social contacts [10]. The different age 
effects in the two social media sites could be explained by 
differences in: 1) usage - professionals tend to predominantly 
prefer Twitter over Facebook for work matters, and they are 
those who accumulate large numbers of followers [20]; 2) age 
demographics - in 2010, the fraction of users in the [13 — 25] 
age band are 40% in Facebook and only 17% in Twitter [14]. 


Highly-read. Those who are saved in others’ reading lists 
tend be high in Openness, with a correlation coefficient of 
0.17. Openness is generally associated with descriptive terms 
such as imaginative, spontaneous, and adventurous and has 
been found to be positively associated with number of social 
relations in real life (r = 0.23) [38]. 


Influentials. To identify influentials, we now define two mea- 
sures of influence. First, we use a well-established measure 
among social media analyst called ‘Klout’ score (klout.com/). 
This score does not consider number of followers or tweets 
but, instead, considers whether a user’s tweets are clicked, 
replied, and further propagated (retweeted) [21]. Second, we 
employ the measure used by TIME magazine to rank public 
figures such as Barack Obama, Oprah Winfrey, and Lady 
Gaga. The measure combines one’s popularity on both Twitter 
and Facebook by computing 2 followers FM facebook [37], where 
N followers 18 the number of Twitter followers, and 7 facebook 
is the number of Facebook social contacts. 

The results reported in Table II are consistent between the 
two influence scores for Extraversion - this trait is positively 
correlated with both influence scores. In addition, TIME influ- 
entials are high in the trait of Conscientiousness (0.18). This 
trait is associated with descriptive terms such as ambitious, 
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(b) User activity varies with age. 



































Fig. 3. Age distribution and effects on Twitter activity 

: Listeners Popular Highly-read Influential Influential 
Trait log(Following)| log(Followers)| log(Listed) Klout log(TIME) 
O 0.05 0.05 0.17* 0.13 0.00 
Cc 0.08 0.10 0.02 0.01 0.18*** 
E 0.13* 0.15** 0.09 0.15* 0.25*** 
A 0.07 0.02 0.03 -0.17 0.06 
N -0.17** -0.19*** -0.03 -0.03* -0.20*** 
log(Age) | 0.28* 0.37* 0.13 0.05 0.39* 
Male -0.05 -0.05 -0.05 -0.04 0.01 

TABLE II 


CORRELATION COEFFICIENTS BETWEEN BIG FIVE PERSONALITY TRAITS AND FIVE QUANTITIES THAT CHARACTERIZE LISTENERS, POPULAR USERS, 
HIGHLY-READ USERS, AND (Klout & TIME) INFLUENTIALS. STATISTICALLY SIGNIFICANT CORRELATIONS ARE IN BOLD AND THEIR p-VALUES ARE 
EXPRESSED WITH *’S: p < 0.001 (* * *), p< 0.01 (**), AND p < 0.05 (x). 





RMSE 


0.69 
0.76 
0.88 
0.79 
0.85 


TABLE III 
PREDICTABILITY OF THE BIG FIVE TRAITS. THE ROOT-MEAN-SQUARE 
ERROR (RMSE) FOR PREDICTED PERSONALITY SCORES. 
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resourceful and persistent [26]. 


V. PREDICTING PERSONALITY 


Given that correlations are significant, one may wonder 
whether it would be possible to predict personality scores of 
Twitter users, including of those who do not make their tweets 
publicly available. For privacy conscious users, we cannot 
access their tweets but we can access their basic network 
properties, that is, their following, followers, and listed counts. 
Thus, we turn to problem of predicting personality scores 
only on input of the three counts. To this end, for each 
personality trait, we perform a regression analysis with a 10- 
fold cross-validation with 10 iterations using M5’ Rules [42]. 
This algorithm generates decision trees with linear models at 
the leaves using the 1/5’ algorithm, which was introduced 
in Wang & Witten’s work [40] and enhanced the original 
M5 algorithm by Quinlan [31]. We also measure the root- 
mean-square error (RMSE), which is the root mean squared 
differences between predicted values and observed values. On 
the [1,5] score scale, the maximum RMSE is 0.88 (Table II). 


The error is low and, to see why, compare it to the error 
reduction needed to win in the $1.M “Netflix prize”. Netflix, an 
online DVD-rental service, launched an open competition for 
the best collaborative filtering algorithm to predict user ratings 
for films, based on previous ratings (ratings varied on the same 
scale - from 1 to 5) [25]. On 21 September 2009, the grand 
prize of $1M was given to a team that achieved a test RMSE 
of 0.8567 [3]. Our results show that, based on three publicly 
available counts, we can accurately predict users’ personality 
as well as state-of-the-art recommender systems predict user 
ratings for movies. 


VI. DISCUSSION 


Users of social media reveal a lot about themselves, but, 
depending on their privacy attitudes, they also choose not to 
share details they find sensitive. Few Facebook users purposely 
fake their personal information such as dates of birthday, and 
privacy-conscious mobile users have tools that allow them to 
fake their personal data (e.g., fake geographic locations) and 
then share it with mobile social-networking services [4], [30]. 

Users decide what to share and what not to based on reason- 
able expectations. The problem is that unexpected inferences 
can often be made from seemingly innocuous social media 
data. Crandall ef al. showed that, from publicly available 
geo-referenced Flickr pictures, one is able to infer several 
coincidences (e.g., two people taking picture at the same place 
and at the same time). These coincidences, in turn, reveal 
“who befriends whom” [8]. The simple act of uploading few 
pictures on a social media site translates into implicitly and 
unknowingly disclosing one’s private social contacts. 
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(c) Personality traits for influential users (OCEAN vs. log(Klout)). 


Fig. 4. The five personality traits for popular, highly-read, and influential users at population level. For each user type, the trait with the highest correlation 


coefficient comes with regression line (red line). 


This example illustrates that, although users’ decisions on 
what to share might appear reasonable in the short term, they 
might well end up being unreasonable later on and, more 
worryingly, they might disrupt initial users’ social expecta- 
tions. Recent privacy failures are telling stories of disrupted 
social exceptions [9]. A few years ago, Facebook aggregated 
content in ways that made it more visible to users who could 
already access it. When a Facebook user switched to an 
“it’s complicated” relationship, the user thought that only the 
few social contacts regularly visiting his profile would notice 
the change. Suddenly, that was not true anymore. A variety 
of contacts would learn the switch just from their streams 
of updates. This change caused a big outcry, but Facebook 
did not have to back off - the users did. Facebook founder 
Mark Zuckerberg recently contributed to the discussion and 
claimed that the rise of social networking online means that 
people no longer have an expectation of privacy, adding “we 
decided that these would be the social norms now and we 
just went for it’ [17]. The result is that Facebook “users 
are now so hooked that they are unlikely to revolt against a 
gradual loosening of privacy safeguards” [1]. Another example 
comes from the site pleaserobme.com that combined data 
from Twitter and Foursquare (a service that lets people share 


their location so their social contacts can see where they 
are). This site publishes Foursquare location posts that appear 
on Twitter. The problem is that, when a user shares her 
location on Foursquare, the user thinks that only her social 
contacts on Foursquare or Twitter would notice it. But that 
changes with pleaserobme.com - the site exposes whether users 
are somewhere other than their home to the entire Internet 
community, including burglars. More generally, when sharing 
personal data (including location data), one does so in a 
specific social context and consequently has specific social 
expectations - one implicitly guesses who is more or less 
likely to come across that data. However, those exceptions 
are disrupted by inferring something unexpected: in the case 
of pleaserobme.com, whether someone is home or not. It now 
turns out that there is another piece of information that can be 
unexpectedly inferred from publicly available Twitter profiles: 
user personality traits. This insight not only raises awareness 
around privacy issues in social media but also calls for a 
rethinking of current privacy protection mechanisms. 


VII. CONCLUSION 


This study has produced two main insights. First, there 
are important personality similarities and differences among 


different types of Twitter users. All user types (listeners, 
popular, highly-read, and influential users) are emotionally 
stable (low in Neuroticism), and most of them are extrovert. 
These inferences have long been supported informally by 
intuition but have been difficult to make precise. Interestingly, 
popular users tend to be ‘imaginative’, while influential users 
tend to be ‘organized’. 

The second insight is that user personality can be easily 
and effectively predicted from public data, and that suggests 
future directions in a variety of areas, including : 1) Marketing: 
Since there is a relationship between marketing strategies and 
consumer personality [28], [41], one could select ads to which 
a user is likely to be most receptive; 2) User Interface Design: 
One could match not just content but also the basic “look 
and feel” of a social media site to personality traits (this 
idea has been previously called “web site morphing” [12]); 
and 3) Recommender Systems: Given the well-established 
relationship between personality and music taste [32], music 
recommender systems might improve their predictions by also 
considering user personality. 
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